Labor and Education Data: Taking a Look at all the Variables and the Counties In the Convex Hull¶

What is a Convex Hull?

A convex hull is the smallest convex polygon that completely encloses a set of points in a two-dimensional space or a convex polyhedron in a three-dimensional space. The convex hull is formed by stretching a rubber band around a set of points and then releasing it to snap tightly around them.

The concept of convex hulls has applications in various fields such as computer science, computational geometry, operations research, and data analysis. It is often used in algorithms to solve problems such as finding the shortest path between two points, identifying the optimal location of a facility, and identifying outliers in a dataset.

What do the Interior Points from a Convex Hull Tell us?

The interior points of a convex hull tell us about the relationship between the points and the hull itself. Specifically, interior points indicate that the points within the convex hull are tightly clustered and do not have significant gaps or variations in their arrangement. The tightly clustered points suggest that the points are similar in some way, such as having similar characteristics or being part of a similar category or group.

In contrast, if there are many points outside the convex hull, this indicates that the points are more widely dispersed and may have more significant differences or variations among them.

In [1]:
import numpy as np #linear algebra
import pandas as pd #data manipulation and analysis
import matplotlib.pyplot as plt #data visualization
import seaborn as sns #data visualization
from scipy.spatial import ConvexHull, convex_hull_plot_2d
import matplotlib.path as mpath
import plotly.express as px
import plotly.graph_objects as go
import textwrap
from urllib.request import urlopen
import json
In [2]:
# Read-in dataframe with cluster ids from KMeans Clustering Analysis
cluster2_df = pd.read_csv("laborEduClusterData.csv", index_col=0)
cluster2_df.head()
Out[2]:
Bachelor's degree or higher of persons age 25 years+, 2017-2021 With a disability, under age 65 years, 2017-2021 Persons without health insurance, under age 65 years In civilian labor force, total of population age 16 years+, 2017-2021 In civilian labor force, female of population age 16 years+, 2017-2021 Total retail sales per capita, 2017 Mean travel time to work (minutes), workers age 16 years+, 2017-2021 Median household income (in 2021 dollars), 2017-2021 Per capita income in past 12 months (in 2021 dollars), 2017-2021 Persons in poverty Total employer establishments, 2020 Total employment, 2020 Total annual payroll, 2020 ($1,000) Population per square mile, 2020 Land area in square miles, 2020 cluster_id Banned or not County Name
0 4406.72 4434.26 2588.95 14184.13 14018.88 9461 36.1 44467 24539 5205.44 385 4572 167427 47.1 583.87 0 0.0 Adams County, Ohio
1 19418.97 12912.09 8540.28 61917.03 58765.26 16266 19.5 55114 28671 15555.51 2286 45012 1983272 253.9 402.55 0 1.0 Allen County, Ohio
2 11614.15 5179.28 4708.44 32592.87 29663.17 9431 24.2 58168 28992 5702.44 1034 18234 701075 124.0 422.99 0 0.0 Ashland County, Ohio
3 14697.89 12361.80 9733.70 55190.08 51296.60 10406 25.8 49680 26777 15281.91 1806 24464 913850 139.0 702.07 0 0.0 Ashtabula County, Ohio
4 21161.10 8005.22 5833.26 34316.97 33386.13 11148 21.9 47061 24990 12969.70 1029 13265 469095 124.0 503.64 0 0.0 Athens County, Ohio

All counties in each cluster¶

The funcationsAll.py file contains python code that imports various libraries for data analysis, visualization, and machine learning.

It defines a function called split_dataframe_by_cluster that takes a pandas DataFrame and the 'cluster_id' column as inputs and splits the DataFrame into multiple DataFrames based on the unique values in the 'cluster_id' column.

Another function called get_cluster_coords_dict is defined, which takes two inputs, a cluster DataFrame and the original DataFrame, and returns a dictionary of the coordinates points for all the counties in each cluster. These functions are used to create a list of coordinate dictionaries for each cluster in the data.

The code then defines three functions, cluster0, cluster1, and cluster2, that each takes a dictionary of cluster coordinates as an input and returns the coordinates for a specific cluster.

Finally, the code defines a function called clusterk_dict that takes a list of cluster DataFrames and a list of coordinate dictionaries as inputs and returns a dictionary where each key is a cluster number. The corresponding coordinate points represent the counties.

In [3]:
from functionsAll import split_dataframe_by_cluster, get_cluster_coords_dict,coords, clusterk_dict,\
cluster0, cluster1, cluster2
In [4]:
# Step 1
df_list = split_dataframe_by_cluster(cluster2_df, 'cluster_id')

# Step 2
coords_list = coords(df_list, cluster2_df)

# Step 3
clusterK_dict = clusterk_dict(df_list, coords_list)
In [5]:
for i in range(len(df_list)):
    if len(df_list) ==1:
        cluster0_dict = cluster0(clusterK_dict)
    elif len(df_list) ==2:
        cluster0_dict = cluster0(clusterK_dict)
        cluster1_dict = cluster1(clusterK_dict)
    elif len(df_list) ==3:
        cluster0_dict = cluster0(clusterK_dict)
        cluster1_dict = cluster1(clusterK_dict)
        cluster2_dict = cluster2(clusterK_dict)
        
    print(f"cluster{i}_dict have been made to dictionary")
        
        
cluster0_dict have been made to dictionary
cluster1_dict have been made to dictionary
cluster2_dict have been made to dictionary

Banned counties in each cluster¶

The funcationsBanned.py file contains python code that imports several libraries/modules for data analysis and visualization, machine learning, and geometry operations. The code defines several functions that take a DataFrame as input and perform operations to extract information about clusters of banned counties based on their coordinates.

The filter_banned_counties function filters out clusters of banned counties from a DataFrame based on their cluster ID and the value of a "Banned or not" column. It returns a list of DataFrames containing banned counties if there are more than two banned counties in a cluster or None if there are no banned counties.

The get_banned_cluster_coords_dict function creates a dictionary of coordinate pairs for each pair of columns in the banned counties DataFrame.

The bannedCoords function takes a list of banned county DataFrames and the original DataFrame and creates a list of dictionaries containing coordinate pairs for each pair of columns in the banned county DataFrame.

The clusterk_dict_banned function takes a list of banned county DataFrames and a list of dictionaries containing coordinate pairs. It creates a dictionary with keys representing each banned county cluster and values containing the respective coordinate pairs.

Finally, the cluster0Banned, cluster1Banned, and cluster2Banned functions return the coordinate pairs for the banned counties in clusters 0, 1, and 2, respectively, based on the input dictionary.

In [6]:
from functionsBanned import filter_banned_counties, get_banned_cluster_coords_dict, bannedCoords, clusterk_dict_banned,\
cluster0Banned, cluster1Banned, cluster2Banned
In [7]:
# 1
banned_counties_df = filter_banned_counties(cluster2_df)

# 2
bannedCoords_list = bannedCoords(banned_counties_df, cluster2_df)

# 3
clusterKBanned_dict = clusterk_dict_banned(banned_counties_df, bannedCoords_list)
Cluster0 had enough banned counties to find non-banned counties in the banned counties convex hull.
Cluster2 had enough banned counties to find non-banned counties in the banned counties convex hull.

Now to look at the counties inside the convex hull of the banned counties¶

The countyName.py file contains Python code is a Python script that imports various libraries/modules such as NumPy, Pandas, Matplotlib, Seaborn, and Scikit-learn to perform data analysis, data visualization, and machine learning tasks. The code contains several functions that take a Pandas DataFrame (df) and other input parameters as arguments and perform specific tasks.

The banned_counties_list function returns a list of county names that have been banned based on a column in the DataFrame called "Banned or not."

The countyNames_cluster0, countyNames_cluster1, and countyNames_cluster2 functions takes four inputs, the original data frame, a dictionary (clusterk_dict) that contains the x and y coordinates of the counties in each cluster and a dictionary (clusterk_banned_dict) that contains the x and y coordinates of the banned counties in each cluster. You may see that not all clusters will contain a banned county, and finaly a list of all the banned counties bannedCountiesList).

These three fucntions returns a dictionary where keys are tuples (pair of coordinates), and values are lists of county names that belong to a specific cluster. The function uses ConvexHull and mpath.Path methods find a specific cluster's boundary and return the county names inside that boundary. If there is a cluster with multiple banned counties, then the merge_dicts function merges two dictionaries (dict1 and dict2) and returns the merged dictionary.

In [8]:
from countyName import banned_counties_list, countyNames_cluster0, countyNames_cluster1, countyNames_cluster2, merge_dicts
In [9]:
bannedCountiesList=banned_counties_list(cluster2_df)


for i in range(len(banned_counties_df)):
    if len(banned_counties_df) ==1:
        cluster0_banned_dict = cluster0Banned(clusterKBanned_dict)
        allCounties=countyNames_cluster0(cluster2_df, cluster0_dict, cluster0_banned_dict, bannedCountiesList)

    elif len(banned_counties_df) ==2:
        cluster0_banned_dict = cluster0Banned(clusterKBanned_dict)
        cluster1_banned_dict = cluster1Banned(clusterKBanned_dict)
        
        countyName0=countyNames_cluster0(cluster2_df, cluster0_dict, cluster0_banned_dict, bannedCountiesList)
        countyName1=countyNames_cluster1(cluster2_df, cluster1_dict, cluster1_banned_dict, bannedCountiesList)
        
        allCounties = merge_dicts(countyName0, countyName1)

    elif len(banned_counties_df) ==3:
        cluster0_banned_dict = cluster0Banned(clusterKBanned_dict)
        cluster1_banned_dict = cluster1Banned(clusterKBanned_dict)
        cluster2_banned_dict = cluster2Banned(clusterKBanned_dict)
        
        countyName0=countyNames_cluster0(cluster2_df, cluster0_dict, cluster0_banned_dict, bannedCountiesList)
        countyName1=countyNames_cluster1(cluster2_df, cluster1_dict, cluster1_banned_dict, bannedCountiesList)
        countyName2=countyNames_cluster0(cluster2_df, cluster2_dict, cluster2_banned_dict, bannedCountiesList)
        
    print(f"cluster{i}_banned_dict have been made to dictionary")
        
cluster0_banned_dict have been made to dictionary
cluster1_banned_dict have been made to dictionary

Now that we have all the column pairs with non-banned counties within thier respected banned counties convex hull. We can now look at which variable pairs contain the most counties in the banned counties' convex hull.

Getting the top variable pairs, and there respected counties¶

The topVariableFunctions.py file contains several functions related to data analysis, visualization, and preprocessing for machine learning. Here is a brief explanation for each function:

filtered_var_pairs This function takes a dictionary called "allCounties" as input. It returns a list of key-value pairs where each key is a tuple containing two strings (category and subcategory), and each value is a list of counties. This function filters the original dictionary and extracts the relevant information for further analysis.

categoryCountyList takes the filtered list of key-value pairs as input and returns a dictionary where each key is a category string, and each value is a list of counties. The function also prints out the category with the most counties.

subcategoryCountyList takes the filtered list of key-value pairs as input and returns a dictionary where each key is a subcategory string, and each value is a list of counties. The function also prints out the subcategory with the most counties.

common keys This function takes two dictionaries as input and returns a new dictionary that contains the intersection of their keys and the union of their values.

freq_var This function takes a ranked list as input and returns a list of tuples where each tuple contains a county name and a list of variable names associated with that county. The ranked list is assumed to have a specific structure (with indices for county name, variable rank, variable name, and list of counties associated with the variable), which needs to be explained in the code snippet provided.

In [10]:
from topVariableFunctions import filtered_var_pairs, categoryCountyList, subcategoryCountyList, commonKeys, freq_var
In [11]:
filt = filtered_var_pairs(allCounties)
In [12]:
categorycountyList = categoryCountyList(filt)
The category with the most counties is 'Mean travel time to work (minutes), workers age 16 years+, 2017-2021' with 26 counties.
In [13]:
subcategorycountyList = subcategoryCountyList(filt)
The subcategory with the most counties is 'Median household income (in 2021 dollars), 2017-2021' with 26 counties.
In [14]:
combinedVars = commonKeys(categorycountyList, subcategorycountyList)

After all variables are combined, we can now take a look at some of the counties that may be at risk the most common variable amoung them

Vizulaization using Common Counties and Frequent Variables¶

Table of Demographic varibales and a list counties¶

In [15]:
sorted_var_dict = dict(sorted(combinedVars.items(), key=lambda x: len(x[1]), reverse=True))
In [16]:
# Make ranking table of vars.
ranked_listTbl = [(i+1, item[0], len(item[1]), ", ".join([textwrap.fill(', '.join(textwrap.wrap(c, width=10)), width=80) for c in item[1]])) for i, item in enumerate(sorted_var_dict.items())]
topVarTbl = pd.DataFrame(ranked_listTbl)
topVarTbl.columns=['Rank', 'Demographic_variable', 'Number_of_counties', 'List_of_Counties']
In [17]:
# try to chnage lengh of cloumn
headerColor = 'grey'
rowEvenColor = 'lightgrey'
rowOddColor = 'white'


fig = go.Figure(data=[go.Table(
    columnwidth = [50,110,75,700],

    header=dict(
        values=['Rank', 'Demographic Variable', 'Number of Counties', 'List of Counties'],
                    line_color='darkslategray',
                    fill_color=headerColor,
                    align=['left','center'],
                    font=dict(color='white', size=12)
    ),
    cells=dict(
        values=[topVarTbl.Rank, topVarTbl.Demographic_variable, topVarTbl.Number_of_counties, topVarTbl.List_of_Counties],
               line_color='darkslategray',
    # 2-D list of colors for alternating rows
    fill_color = [[rowOddColor,rowEvenColor,rowOddColor, rowEvenColor,rowOddColor]*5],
    align = ['center', 'center', 'center', 'left'],
    font = dict(color = 'darkslategray', size = 11)
    ))
])

fig.show()

The table above gives a ranking for each demographic variable in their respective subset of data, the number of counties, and a list of the counties.

Plotting counties frequcy in each variable¶

Checking to see the values of each county that apeared in the table above.

In [18]:
# # Make ranking table of vars.
ranked_list = [(i+1, item[0], len(item[1]), item[1]) for i, item in enumerate(sorted_var_dict.items())]
newdf2 = pd.DataFrame(ranked_list)
newdf2.columns=['Rank', 'Demographic_variable', 'Number_of_counties', 'List_of_Counties']

Transforming newdf2 to become tidy:¶

In [19]:
# Separate the list of counties into individual rows
newdf2 = newdf2.explode('List_of_Counties')

# Drop the duplicate columns
newdf2 = newdf2.drop_duplicates(subset=['Rank', 'List_of_Counties'])

# Rename the columns for clarity
newdf3 = newdf2.rename(columns={
    'Demographic_variable': 'Demographic_Variable',
    'Number_of_counties': 'Number_of_Counties',
    'List_of_Counties': 'County'
})
In [20]:
# Melting OG data:
melted_df = pd.melt(cluster2_df, id_vars=['County Name'], var_name='Attribute', value_name='Value')
melted_df['County Name'] = melted_df['County Name'].apply(lambda x: x[:-13])
In [21]:
tidy_df = pd.merge(newdf3, melted_df[['County Name', 'Attribute', 'Value']], left_on=['County','Demographic_Variable'],
                   right_on=['County Name', 'Attribute'])
tidy_df.drop(['Attribute','County Name'], axis=1, inplace=True)

Frequency of Each County and there values among the most common variables found¶

In [22]:
# create bar chart trace
fig = px.bar(tidy_df, x=tidy_df.Demographic_Variable, y=tidy_df.Number_of_Counties, text='Value', color='County',
             labels={"Attribute": "Attribute",
                     "Value": "Attribute Value",
                     "Number_of_Counties":'Count of each County',
                     'Demographic_Variable': 'Top Education and Labor Demographic Variables'
                 },
             title="Frequency of Each County With the Most Common Variables",)

fig.update_traces(textposition='inside')
#  update the layout to adjust the size of the plot
fig.update_layout(
    width=1200,  # set the width of the plot to 800 pixels
    height=1400,  # set the height of the plot to 600 pixels
)

# display the plot
fig.show()

Table of Counties and a list of Education and Labor Demographic Variables¶

In [23]:
freq_var =freq_var(ranked_list)
In [24]:
# var_freq to dictionary
var_freq_dict = {item[0]: item[1] for i, item in enumerate(freq_var)}
In [25]:
# sorting var_freq_dict dictionary
sorted_var_freq_dict = dict(sorted(var_freq_dict.items(), key=lambda x: len(x[1]), reverse=True))
In [26]:
# Make ranking table of vars.
ranked_var_freqViz = [(i+1, item[0], len(item[1]), ", ".join([textwrap.fill(' '.join(textwrap.wrap(c, width=10)), width=80) for c in item[1]])) for i, item in enumerate(sorted_var_freq_dict.items())]
top_var_freqdfViz = pd.DataFrame(ranked_var_freqViz)
top_var_freqdfViz.columns=['Rank', 'County', 'Number_of_variables', 'List_of_variables']
In [27]:
headerColor = 'grey'
rowEvenColor = 'lightgrey'
rowOddColor = 'white'


fig = go.Figure(data=[go.Table(
    columnwidth = [40,70,100,600],

    header=dict(
        values=['Rank', 'County Name', 'Number of Demographic Variables', 'List of Demographics'],
                    line_color='darkslategray',
                    fill_color=headerColor,
                    align=['center','center', 'center', 'left'],
                    font=dict(color='white', size=12)
    ),
    cells=dict(
        values=[top_var_freqdfViz.Rank, top_var_freqdfViz.County, top_var_freqdfViz.Number_of_variables,
                top_var_freqdfViz.List_of_variables],
               line_color='darkslategray',
    # 2-D list of colors for alternating rows
    fill_color = [[rowOddColor,rowEvenColor,rowOddColor, rowEvenColor,rowOddColor]*5],
    align = ['center', 'center', 'center', 'left'],
    font = dict(color = 'darkslategray', size = 11)
    ))
])

fig.show()

Tidying for plot¶

In [28]:
# # Make ranking table of vars.
ranked_vars_list = [(i+1, item[0], len(item[1]), item[1]) for i, item in enumerate(sorted_var_freq_dict.items())]
top_var_freqdf = pd.DataFrame(ranked_vars_list)
top_var_freqdf.columns=['Rank', 'County', 'Number_of_variables', 'List_of_variables']
In [29]:
# Separate the list of counties into individual rows
test2 = top_var_freqdf.explode('List_of_variables')

# Drop the duplicate columns
test2 = test2.drop_duplicates(subset=['Rank', 'List_of_variables'])

# Rename the columns for clarity
test5 = test2.rename(columns={
    'List_of_variables': 'Demographic_Variable',
    'Number_of_variables': 'Number_of_variables',
    'County': 'County'
})

# Display the updated data frame
# test5.head()
In [30]:
tidy_df2 = pd.merge(test5, melted_df[['County Name', 'Attribute', 'Value']], left_on=['County','Demographic_Variable'],
                   right_on=['County Name', 'Attribute'])

Frequency of Each Labor and Education Demographic Variables and there values among the most common Counties found¶

In [31]:
# create bar chart trace
fig = px.bar(tidy_df2, x=tidy_df2.County, color='Demographic_Variable',text='Value',
             labels={"Attribute": "Attribute",
                     "Value": "Attribute Value",
                     "Demographic_Variable":"Demographic Variables"
                 },
             title="Frequency of Each Education and Labor Demographic Variables With the Most Common Counties",)

fig.update_traces(textposition='inside')
# update the layout to adjust the size of the plot
fig.update_layout(
    width=1500,  # set the width of the plot to 800 pixels
    height=500,  # set the height of the plot to 600 pixels
)

# display the plot
fig.show()

Most frequent Demographic variables¶

In [32]:
## Most common variables
res = sum(sorted_var_freq_dict.values(), [])
mostCommonVar = list(set(res))
mostCommonVar
Out[32]:
['Median household income (in 2021 dollars), 2017-2021',
 'Total annual payroll, 2020 ($1,000)',
 'Total employment, 2020',
 'In civilian labor force, total of population age 16 years+, 2017-2021',
 'With a disability, under age 65 years, 2017-2021',
 'Per capita income in past 12 months (in 2021 dollars), 2017-2021',
 'Population per square mile, 2020',
 'Total employer establishments, 2020',
 'Persons without health insurance, under age 65 years',
 'Persons in poverty',
 'In civilian labor force, female of population age 16 years+, 2017-2021',
 'Mean travel time to work (minutes), workers age 16 years+, 2017-2021',
 'Total retail sales per capita, 2017']

Most Frequent Counties¶

In [33]:
mostCommonCounties = top_var_freqdfViz['County'].tolist()
print(mostCommonCounties)
['Sandusky', 'Ashland', 'Huron', 'Fulton', 'Muskingum', 'Shelby', 'Belmont', 'Marion', 'Washington', 'Darke', 'Jefferson', 'Clinton', 'Tuscarawas', 'Guernsey', 'Pickaway', 'Wayne', 'Erie', 'Greene', 'Defiance', 'Columbiana', 'Athens', 'Ross', 'Mercer', 'Lawrence', 'Trumbull', 'Van Wert', 'Williams', 'Clark', 'Wyandot', 'Ashtabula', 'Scioto', 'Licking', 'Warren', 'Wood', 'Paulding', 'Henry', 'Richland', 'Champaign', 'Ottawa', 'Miami', 'Preble', 'Holmes', 'Clermont', 'Delaware', 'Carroll', 'Morrow', 'Hardin', 'Coshocton', 'Brown', 'Lake', 'Portage', 'Fairfield', 'Highland']

Plotting the Counties at Risk on a Map of Ohio¶

Making Excel file with the most frequent variables for Population Demographic Data¶

In [34]:
counties_ohio = [county + ' County, Ohio' for county in mostCommonCounties]
In [35]:
y = cluster2_df[cluster2_df['County Name'].isin(counties_ohio)]
mostFreqcountyVarTbl = y.iloc[:,:-3]
mostFreqcountyVarTbl['County'] = y.iloc[:, -1]
mostFreqcountyVarTbl.head()
mostFreqcountyVarTbl.to_excel('mostFreqLaborEduData.xlsx')
In [36]:
# Creating new data frame with only the most common variable
mapPlotdf = cluster2_df[combinedVars]
In [37]:
with urlopen('https://raw.githubusercontent.com/plotly/datasets/master/geojson-counties-fips.json') as response:
    counties = json.load(response)

df = pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/minoritymajority.csv',
                   dtype={"fips": str})

# Filter the data frame to include only Ohio
df_ohio = df[df['STNAME'] == 'Ohio']

# Filter the counties GeoJSON file to include only Ohio counties
counties_ohio = {'type': 'FeatureCollection', 'features': []}
for feature in counties['features']:
    if feature['id'][:2] == '39':
        counties_ohio['features'].append(feature)
In [38]:
# Make new data frame with the most common vars
mapPlotdf['TempCounty Name'] = cluster2_df['County Name'].apply(lambda x: x[:-6])
In [39]:
# merge the two data frames based on the 'County Name' and 'CTYNAME' columns
merged_df = pd.merge(mapPlotdf, df_ohio[['CTYNAME', 'FIPS']], left_on='TempCounty Name', right_on='CTYNAME')

# drop the duplicate 'CTYNAME' column
merged_df.drop('CTYNAME', axis=1, inplace=True)
merged_df.drop('TempCounty Name', axis=1, inplace=True)
In [40]:
merged_df['TempCounty Name'] = cluster2_df['County Name'].apply(lambda x: x[:-13])
In [41]:
# Example list of counties to check
counties_to_check = merged_df['TempCounty Name'].tolist()

# Create a list to hold safe counties
safe_counties = []

# Check each county and append to the safe_counties list if not in either of the two lists
for county in counties_to_check:
    if county not in mostCommonCounties and county not in bannedCountiesList:
        safe_counties.append(county)

# Print the list of safe counties
print(safe_counties)
['Adams', 'Cuyahoga', 'Fayette', 'Franklin', 'Gallia', 'Geauga', 'Hamilton', 'Harrison', 'Hocking', 'Jackson', 'Lorain', 'Lucas', 'Madison', 'Mahoning', 'Meigs', 'Monroe', 'Montgomery', 'Morgan', 'Noble', 'Perry', 'Pike', 'Putnam', 'Stark', 'Vinton', 'Summit']
In [42]:
merged_df['risk'] = merged_df['TempCounty Name'].apply(lambda x: 'At risk' if x in mostCommonCounties else
                                                      'Already has Banned' if x in bannedCountiesList else
                                                      'Safe' if x in safe_counties else merged_df[merged_df['TempCounty Name'] == x]['risk'].values[0])
In [43]:
# One layer
# import plotly.express as px
fig = px.choropleth(merged_df, geojson=counties_ohio, locations='FIPS', color='risk',
                    color_continuous_scale="Viridis",
                    range_color=(0, 2),
                    scope="usa",
                    hover_data=["TempCounty Name",'Median household income (in 2021 dollars), 2017-2021',
                                'Per capita income in past 12 months (in 2021 dollars), 2017-2021',
                                'Mean travel time to work (minutes), workers age 16 years+, 2017-2021'],
                    labels={'risk':'Risk Level',
                           "TempCounty Name":'County',
                            'Median household income (in 2021 dollars), 2017-2021': 'Median Household Income',
                            'Per capita income in past 12 months (in 2021 dollars), 2017-2021':'Per Capita Income (Past 12 Months)',
                            'Mean travel time to work (minutes), workers age 16 years+, 2017-2021':'Mean Travel Time to Work (Ages 16+)'},
                    color_discrete_map={'Already has Banned': 'red',
                                        'At risk': 'blue', 'Safe': 'green'})

                          
fig.update_geos(fitbounds="locations", visible=False)
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})


fig.show()
In [ ]: